{
 "cells": [
  {
   "cell_type": "markdown",
   "id": "408f38cd",
   "metadata": {
    "colab_type": "text",
    "id": "view-in-github"
   },
   "source": [
    "<a href=\"https://colab.research.google.com/github/AI4Finance-Foundation/FinRL-Meta/blob/master/Demo_China_A_share_market.ipynb\" target=\"_parent\"><img src=\"https://colab.research.google.com/assets/colab-badge.svg\" alt=\"Open In Colab\"/></a>"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "ef0f3a7e",
   "metadata": {
    "id": "ef0f3a7e"
   },
   "source": [
    "## Quantitative trading in China A stock market with FinRL"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 1,
   "id": "Q8gKimq2PZDh",
   "metadata": {
    "colab": {
     "base_uri": "https://localhost:8080/",
     "height": 1000
    },
    "id": "Q8gKimq2PZDh",
    "outputId": "ea18fe12-1b5e-492e-fa4a-53f5fa776694"
   },
   "outputs": [],
   "source": [
    "import os\n",
    "import time\n",
    "import sys\n",
    "from copy import deepcopy\n",
    "\n",
    "import torch\n",
    "import torch.nn as nn\n",
    "import numpy as np\n",
    "import numpy.random as rd\n",
    "import pandas as pd"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "8fb79afd",
   "metadata": {
    "colab": {
     "base_uri": "https://localhost:8080/"
    },
    "id": "zDejJbjYQuUi",
    "outputId": "13f9a389-1113-4a67-b962-68a29ece6c21",
    "scrolled": true
   },
   "source": [
    "### Download training data\n",
    "Download training data (2.1MB) from github: `https://github.com/Yonv1943/Python/blob/master/scow`\n",
    "And save in current working path :`.`\n",
    "\n",
    "训练数据是中国A股 1113 天的股票价格，以及其他与交易相关的因子。"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 3,
   "id": "7caf60c7",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "--2022-04-02 16:30:34--  https://github.com/Yonv1943/Python/blob/master/scow/China_A_shares.pandas.dataframe\n",
      "Resolving github.com (github.com)... 20.205.243.166\n",
      "Connecting to github.com (github.com)|20.205.243.166|:443... connected.\n",
      "HTTP request sent, awaiting response... 200 OK\n",
      "Length: unspecified [text/html]\n",
      "Saving to: ‘China_A_shares.pandas.dataframe.3’\n",
      "\n",
      "China_A_shares.pand     [ <=>                ] 120.73K  --.-KB/s    in 0.1s    \n",
      "\n",
      "2022-04-02 16:30:35 (1.12 MB/s) - ‘China_A_shares.pandas.dataframe.3’ saved [123624]\n",
      "\n"
     ]
    }
   ],
   "source": [
    "!wget https://github.com/Yonv1943/Python/blob/master/scow/China_A_shares.pandas.dataframe"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "6978812c",
   "metadata": {},
   "source": [
    "### 金融仿真环境部分 FinRL\n",
    "相关代码来自[FinRL-Meta Demo_China_A_share_market](https://github.com/AI4Finance-Foundation/FinRL-Meta/blob/master/Demo_China_A_share_market.ipynb)\n",
    "\n",
    "我们将这1113天的中国A股数据，简单地分成：\n",
    "- 训练集占75%，0~834天\n",
    "- 测试集站25% 834~1113天\n",
    "分钟级甚至秒级的数据的数据量将会很多，我们这里只是一个简单的教程，因为数据太少，所以我们没有划分验证集（validation），请读者注意。\n",
    "\n",
    "这个的交易训练环境 env（或者说仿真器 simulator）的超参数配置如下：\n",
    "\n",
    "```\n",
    "class StockTradingEnv:\n",
    "    def __init__(self, \n",
    "        initial_amount=1e6,  # 初始本金\n",
    "        max_stock=1e2,  # 最大交易额度，买入或卖出100个单位\n",
    "        buy_cost_pct=1e-3,  # 交易损耗率设为 0.001\n",
    "        sell_cost_pct=1e-3,  # 交易损耗率设为 0.001\n",
    "        gamma=0.99,  # 强化学习的折扣比率，给人为设置的终止状态的reward进行补偿的时候会用到\n",
    "        beg_idx=0,   # 使用数据集的这个位置之后的数据\n",
    "        end_idx=1113  # 使用数据集的这个位置之前的数据\n",
    "     ):\n",
    "```\n",
    "\n",
    "cumulative_returns 表示从开始到结束智能体交易获得的收益。为了方便，我们直接显示了「本金的增长倍数」。\n",
    "如 1.19 表示在一段时间的交易后，本金增长了1.19倍。\n",
    "- random action 表示交易动作是随机的 `action=rd.uniform(-1, 1, action_dim)`，强化学习训练得到的智能体不应该比这个分数低。\n",
    "- buy all share 表示一直使用最大的额度买入所有股票 `action=np.ones(action_dim)`，能一定程度上反映大盘走势\n",
    "\n",
    "我们为了方便演示和学习，选择了较少的数据，从而把训练时间压缩到1000秒内。实际上，训练数据需要再多一个数量级。"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 5,
   "id": "5de98a09",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "| StockTradingEnv: close_ary.shape (279, 15)\n",
      "| StockTradingEnv: tech_ary.shape (279, 120)\n",
      "\n",
      "cumulative_returns of random action:      1.24\n",
      "cumulative_returns of random action:      1.28\n",
      "cumulative_returns of random action:      1.53\n",
      "cumulative_returns of random action:      1.29\n",
      "\n",
      "cumulative_returns of buy all share:      1.19\n",
      "cumulative_returns of buy all share:      1.19\n",
      "cumulative_returns of buy all share:      1.19\n",
      "cumulative_returns of buy all share:      1.19\n",
      "\n"
     ]
    }
   ],
   "source": [
    "from demo_FinRL_ElegantRL_China_A_shares import StockTradingEnv\n",
    "from demo_FinRL_ElegantRL_China_A_shares import check_env\n",
    "\n",
    "check_env()"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "54077e6a",
   "metadata": {
    "colab": {
     "base_uri": "https://localhost:8080/"
    },
    "id": "H0-reEAYJTkU",
    "outputId": "2d6895dc-93a4-4661-934d-8bf1cd857e9c"
   },
   "source": [
    "### 深度强化学习部分 ElegantRL\n",
    "\n",
    "相关代码来自[ElegantRL helloworld](https://github.com/AI4Finance-Foundation/ElegantRL/tree/master/elegantrl_helloworld)\n",
    "\n",
    "我们使用ElegantRL库的代码，连接上FinRL的仿真环境，加载中国A股的数据，调用`run()`函数，训练一个自动交易智能体。\n",
    "\n",
    "`run()` 函数将会用到以下内容：\n",
    "\n",
    "```\n",
    "# env.py\n",
    "class StockTradingEnv()  # 来自FinRL库的交易仿真环境\n",
    "def build_env()  # 创建训练仿真环境\n",
    "def get_gym_env_args()  # 获得仿真环境的参数\n",
    "\n",
    "# net.py\n",
    "class ActorPPO  # PPO算法的策略网络\n",
    "class CriticPPO  # PPO算法的价值网络\n",
    "\n",
    "# agent.py\n",
    "class AgentPPO  # PPO算法的主体\n",
    "class ReplayBufferList  # 经验回放缓存（存放强化学习的训练数据）\n",
    "\n",
    "# run.py\n",
    "class Arguments  # 强化学习的超参数（可以看这个类的注释了解超参数的作用）\n",
    "def train_agent()  # 训练强化学习智能体的函数\n",
    "```\n",
    "\n",
    "**记得选则你训练用的GPU卡ID**，填上0表示用ID编号为0 的GPU卡`run(gpu_id=0)`。如果没有GPU，可以填`-1`，或者不做任何处理，代码会自己选择。\n",
    "\n",
    "我们用的数据比较少，用笔记本也能在500秒内完成训练，想要缩短训练时间，可以把训练步数改小`args.break_step = int(5e5)`"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "42ac7297",
   "metadata": {
    "id": "42ac7297"
   },
   "source": [
    "### Import modules"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 8,
   "id": "d1cd1fbd",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "| StockTradingEnv: close_ary.shape (1113, 15)\n",
      "| StockTradingEnv: tech_ary.shape (1113, 120)\n",
      "| Arguments Remove cwd: ./StockTradingEnv-v2_PPO_-1\n",
      "| StockTradingEnv: close_ary.shape (834, 15)\n",
      "| StockTradingEnv: tech_ary.shape (834, 120)\n",
      "Step:5.00e+03  ExpR:    0.08  Returns:    1.55  ObjC:    3.70  ObjA:    0.05  \n",
      "Step:6.00e+04  ExpR:    0.32  Returns:    3.04  ObjC:    3.16  ObjA:   -0.02  \n",
      "Step:1.15e+05  ExpR:    0.40  Returns:    3.27  ObjC:    1.67  ObjA:   -0.05  \n",
      "Step:1.70e+05  ExpR:    0.42  Returns:    4.00  ObjC:    1.31  ObjA:    0.17  \n",
      "Step:2.25e+05  ExpR:    0.50  Returns:    4.12  ObjC:    0.97  ObjA:    0.16  \n",
      "Step:2.80e+05  ExpR:    0.58  Returns:    4.10  ObjC:    0.99  ObjA:    0.01  \n",
      "Step:3.35e+05  ExpR:    0.62  Returns:    4.66  ObjC:    1.25  ObjA:    0.03  \n",
      "Step:3.90e+05  ExpR:    0.67  Returns:    4.74  ObjC:    1.02  ObjA:    0.20  \n",
      "Step:4.45e+05  ExpR:    0.73  Returns:    5.90  ObjC:    0.90  ObjA:   -0.13  \n",
      "Step:5.00e+05  ExpR:    0.77  Returns:    5.50  ObjC:    0.72  ObjA:    0.22  \n",
      "| UsedTime: 276 | SavedDir: ./StockTradingEnv-v2_PPO_-1\n"
     ]
    }
   ],
   "source": [
    "from demo_FinRL_ElegantRL_China_A_shares import StockTradingEnv\n",
    "from demo_FinRL_ElegantRL_China_A_shares import get_gym_env_args\n",
    "from demo_FinRL_ElegantRL_China_A_shares import AgentPPO\n",
    "from demo_FinRL_ElegantRL_China_A_shares import Arguments\n",
    "from demo_FinRL_ElegantRL_China_A_shares import train_agent\n",
    "\n",
    "def run(gpu_id=-1):\n",
    "    env = StockTradingEnv()\n",
    "    env_func = StockTradingEnv\n",
    "    env_args = get_gym_env_args(env=env, if_print=False)\n",
    "    env_args['beg_idx'] = 0  # training set\n",
    "    env_args['end_idx'] = 834  # training set\n",
    "\n",
    "    args = Arguments(AgentPPO, env_func=env_func, env_args=env_args)\n",
    "    args.target_step = args.max_step * 4\n",
    "    args.reward_scale = 2 ** -7\n",
    "    args.learning_rate = 2 ** -14\n",
    "    args.break_step = int(5e5)\n",
    "\n",
    "    args.learner_gpus = gpu_id\n",
    "    args.random_seed += gpu_id + 1943\n",
    "    train_agent(args)\n",
    "\n",
    "run()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 2,
   "id": "f503803d",
   "metadata": {},
   "outputs": [],
   "source": [
    "训练结束后，会有很多策略网络的模型文件保存在 `| Arguments Remove cwd: ./StockTradingEnv-v2_PPO_-1`\n",
    "    \n",
    "你也可以在训练前，修改这个超参数，自己选择保存目录位置`args.cwd=./current_working_directory`\n",
    "\n",
    "我接下来使用函数`evaluate_models_in_directory(dir_path='./StockTradingEnv-v2_PPO_-1')`评估保存在目录中的模型文件。"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 13,
   "id": "fluid-taylor",
   "metadata": {
    "colab": {
     "base_uri": "https://localhost:8080/"
    },
    "id": "fluid-taylor",
    "outputId": "19584d54-5357-49ae-bc06-96bf66434867",
    "scrolled": true
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "| evaluate_models_in_directory: gpu_id -1\n",
      "| evaluate_models_in_directory: dir_path ./StockTradingEnv-v2_PPO_-1\n",
      "| StockTradingEnv: close_ary.shape (279, 15)\n",
      "| StockTradingEnv: tech_ary.shape (279, 120)\n",
      "cumulative_returns     1.093  actor_00000000004998_00000003_00000.08.pth\n",
      "cumulative_returns     1.067  actor_00000000059976_00000033_00000.32.pth\n",
      "cumulative_returns     1.107  actor_00000000114954_00000062_00000.40.pth\n",
      "cumulative_returns     1.090  actor_00000000169932_00000093_00000.42.pth\n",
      "cumulative_returns     1.080  actor_00000000224910_00000122_00000.50.pth\n",
      "cumulative_returns     1.147  actor_00000000279888_00000152_00000.58.pth\n",
      "cumulative_returns     1.257  actor_00000000334866_00000183_00000.62.pth\n",
      "cumulative_returns     1.295  actor_00000000389844_00000213_00000.67.pth\n",
      "cumulative_returns     1.316  actor_00000000444822_00000243_00000.73.pth\n",
      "cumulative_returns     1.299  actor_00000000499800_00000273_00000.77.pth\n"
     ]
    }
   ],
   "source": [
    "from demo_FinRL_ElegantRL_China_A_shares import evaluate_models_in_directory\n",
    "\n",
    "evaluate_models_in_directory(dir_path='./StockTradingEnv-v2_PPO_-1')"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "eb601f4a",
   "metadata": {
    "id": "eb601f4a"
   },
   "source": [
    "cumulative_returns 表示从开始到结束智能体交易获得的收益。为了方便，我们直接显示了「本金的增长倍数」。\n",
    "\n",
    "`actor_00000000004998_00000003_00000.08.pth` 是训练时保存的策略网络模型文件 `actor`，表示它是在环境中采样4998步，训练3秒，探索环境的得分是0.08的一个模型，它的实际得分是 1.093，表示在一段时间的交易后，本金增长了1.093倍。\n",
    "\n",
    "欢迎使用 [FinRL-Meta Demo_China_A_share_market](https://github.com/AI4Finance-Foundation/FinRL-Meta/blob/master/Demo_China_A_share_market.ipynb) 以及 [ElegantRL helloworld](https://github.com/AI4Finance-Foundation/ElegantRL/tree/master/elegantrl_helloworld) 的开源代码做出比这个例子更好的东西。\n",
    "\n",
    "如果想要交流更多与金融强化学习相关的话题，我们推荐：\n",
    "- 英文社区 [FinRL的 Slack channel](https://join.slack.com/t/ai4financeworkspace/shared_invite/zt-v670l1jm-dzTgIT9fHZIjjrqprrY0kg)\n",
    "- 中文社区 [深度强化学习实验室网站：FinR L板块](http://deeprlhub.com/t/finrl)。这是讨论深度强化学习的网站，有FinRL板块。与QQ群不同的，讨论的记录，很方便查看。\n",
    "- QQ群 “金融强化学习 FinRL”（群号341070204），它是 “深度强化学习 ElegantRL”的姊妹群，原来的群长期处于满2000人的状态，所以我在2022-03-31日创建了这个新群。新群有了解金融强化学习的多位成员，一起讨论，一起成长。 "
   ]
  },
  {
   "cell_type": "markdown",
   "id": "ce11979d",
   "metadata": {
    "id": "ce11979d"
   },
   "source": [
    "### Authors\n",
    "github username YonV1943，知乎 曾伊言"
   ]
  }
 ],
 "metadata": {
  "colab": {
   "collapsed_sections": [],
   "include_colab_link": true,
   "name": "Demo_China_A_share_market.ipynb",
   "provenance": []
  },
  "interpreter": {
   "hash": "d9161d898770fc09463e4d697d40f3619206d01233bb8ab45ec62b646b625d48"
  },
  "kernelspec": {
   "display_name": "Python 3",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.8.8"
  },
  "latex_envs": {
   "LaTeX_envs_menu_present": true,
   "autoclose": false,
   "autocomplete": true,
   "bibliofile": "biblio.bib",
   "cite_by": "apalike",
   "current_citInitial": 1,
   "eqLabelWithNumbers": true,
   "eqNumInitial": 1,
   "hotkeys": {
    "equation": "Ctrl-E",
    "itemize": "Ctrl-I"
   },
   "labels_anchors": false,
   "latex_user_defs": false,
   "report_style_numbering": false,
   "user_envs_cfg": false
  },
  "notify_time": "5",
  "varInspector": {
   "cols": {
    "lenName": 16,
    "lenType": 16,
    "lenVar": 40
   },
   "kernels_config": {
    "python": {
     "delete_cmd_postfix": "",
     "delete_cmd_prefix": "del ",
     "library": "var_list.py",
     "varRefreshCmd": "print(var_dic_list())"
    },
    "r": {
     "delete_cmd_postfix": ") ",
     "delete_cmd_prefix": "rm(",
     "library": "var_list.r",
     "varRefreshCmd": "cat(var_dic_list()) "
    }
   },
   "types_to_exclude": [
    "module",
    "function",
    "builtin_function_or_method",
    "instance",
    "_Feature"
   ],
   "window_display": false
  }
 },
 "nbformat": 4,
 "nbformat_minor": 5
}
